MPJ Express Meets YARN: Towards Java HPC on Hadoop Systems
نویسندگان
چکیده
Many organizations—including academic, research, commercial institutions—have invested heavily in setting up High Performance Computing (HPC) facilities for running computational science applications. On the other hand, the Apache Hadoop software—after emerging in 2005— has become a popular, reliable, and scalable open-source framework for processing large-scale data (Big Data). Realizing the importance and significance of Big Data, an increasing number of organizations are investing in relatively cheaper Hadoop clusters for executing their mission critical data processing applications. An issue here is that system administrators at these sites might have to maintain two parallel facilities for running HPC and Hadoop computations. This, of course, is not ideal due to redundant maintenance work and poor economics. This paper attempts to bridge this gap by allowing HPC and Hadoop jobs to co-exist on a single hardware facility. We achieve this goal by exploiting YARN—Hadoop v2.0—that de-couples the computational and resource scheduling part of the Hadoop framework from HDFS. In this context, we have developed a YARN-based reference runtime system for the MPJ Express software that allows executing parallel MPI-like Java applications on Hadoop clusters. The main contribution of this paper is provide Big Data community access to MPI-like programming using MPJ Express. As an aside, this work allows parallel Java applications to perform computations on data stored in Hadoop Distributed File System (HDFS).
منابع مشابه
Design and Implementation of Parallel Debugger and Profiler for MPJ Express
MPJ Express is a messaging system that allows computational scientists to write and execute parallel Java applications on High Performance Computing (HPC) hardware. Despite its successful adoption in the Java HPC community, the MPJ Express software currently does not provide any support for debugging and profiling parallel applications and hence forces its users to rely on manual and tedious de...
متن کاملA Buffering Layer to Support Derived Types and Proprietary Networks for Java HPC
Abstract. MPJ Express is our implementation of MPI-like bindings for Java. In this paper we discuss our intermediate buffering layer that makes use of the so-called direct byte buffers introduced in the Java New I/O package. The purpose of this layer is to support the implementation of derived datatypes. MPJ Express is the first Java messaging library that implements this feature using pure Jav...
متن کاملDevice level communication libraries for high-performance computing in Java
Since its release, the Java programming language has attracted considerable attention from the highperformance computing (HPC) community because of its portability, high programming productivity, and built-in multithreading and networking support. As a consequence, several initiatives have been taken to develop a high-performance Java message-passing library to program distributed memory archit...
متن کاملBig Data at HPC Wales
This paper describes an automated approach to handling Big Data workloads on HPC systems. We describe a solution that dynamically creates a unified cluster based on YARN in an HPC Environment, without the need to configure and allocate a dedicated Hadoop cluster. The end user can choose to write the solution in any combination of supported frameworks, a solution that scales seamlessly from a fe...
متن کاملMPJ Express Meets Gadget: Towards a Java Code for Cosmological Simulations
Gadget-2 is a massively parallel structure formation code for cosmological simulations. In this paper, we present a Java version of Gadget-2. We evaluated the performance of the Java version by running a colliding galaxy simulation and found that it can achieve around 70% of C Gadget-2’s performance.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015